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1.0  SUMMARY 


In  this  project,  a  synthetic  Deoxyribonucleic  Acid,  DNA-based  memory  called 
ComDMems  (Combinatorial  DNA  Memories)  was  developed.  This  research  focused  on  the 
application  and  implementation  of  combinatorial  based  information  theory  and  group  testing  to 
create  associative  DNA  memories  and  to  retrieve  information  stored  in  these  DNA  memories  by 
chemical  and  electro-chemical  means. 

This  research  demonstrates  that  this  combinatorial  method  can  feasibly  yield  billions  of 
covert  and  synthetic  DNA  memory  strands  that  carry  object  and  process  information.  A  key 
component  of  this  innovation  is  the  combinatorial  method  of  bio-memory  design  and  detection 
that  encodes  item  or  process  information  as  numerical  sequences  represented  in  DNA.  This  DNA 
data  structure  can  be  read  by  the  wet  laboratory  method  polymerase  chain  reaction  (PCR)  and 
then  algorithmically  decoded  to  retrieve  virtually  an  unlimited  amount  of  item  or  process 
information  that  has  been  stored  in  the  combinatorial  memories. 

ComDMem  is  a  content  addressable  memory  (CAM)  as  opposed  to  a  standard  random 
access  memory  (RAM).  A  standard  RAM  goes  directly  to  a  physical  address  and  returns  the 
contents.  A  CAM  uses  the  content  of  the  input  to  direct  the  search  of  its  entire  memory  for  the 
specified  data  word. 

ComDMem  is  a  content  addressable  memory  (CAM)  as  opposed  to  a  standard  random 
access  memory  (RAM).  A  standard  RAM  goes  directly  to  a  physical  address  and  returns  the 
contents.  ComDMem  achieves  CAM  when  multiple  parallel  PCR  probes,  specific  for  certain 
pieces  of  information,  search  the  ComDMem  for  memories  that  contain  these  pieces  of 
information.  In  this  way  all  memories  associated  with  a  concept(s)  can  be  retrieved  and  decoded 
in  parallel. 
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2.0  INTRODUCTION 


In  [l]-[7]  it  has  been  shown  that  the  hybridization  that  occurs  between  a  DNA  strand  and 
its  Watson-Crick  complement  can  be  used  to  perform  mathematical  computation.  This  research 
addresses  how  the  massive  parallelism  of  DNA  hybridization  reactions  can  be  exploited  to 
construct  a  DNA  based  associative  memory. 

Single  strands  of  DNA  are  polymers  of  nucleotide  bases  adenine  (A),  cytosine  (C), 
guanine  (G)  and  thymine  (T)  and  thus  can  be  represented  by  sequences  of  the  letters  A,  C,  G,  and 
T.  DNA  sequences  have  an  orientation  that  reflects  the  asymmetric  covalent  linking  between 
consecutive  bases  in  the  DNA  strand  backbone;  e.g.,  5'AACG3'  is  distinct  from  5'GCAA3',  but  it 
is  identical  to  3'GCAA5'. 

DNA  can  be  single- stranded  (ssDNA)  or  double-stranded  (dsDNA).  ssDNA  most  easily 
forms  into  a  double-stranded  helix  with  its  oppositely  directed  reverse  complement.  To  obtain 
the  3 '^5'  reverse  complement  of  a  5 '^3'  strand  of  DNA,  substitute  A  with  T  and  C  with  G  and 
vice-versa.  For  example,  the  3'— >^5'  reverse  complement  of  5'TCGCA3'  is  3'AGCGT5'.  If  x  is  a 
DNA  sequence,  then  let  x  denote  it  reverse  complement  in  the  opposing  3'— >^5'  direction.  For 
example  5'TCGCA3'  =  3'AGCGT5'.  Henceforth,  strands  without  strikethrough  are  5'^3'  and 
strands  with  strikethrough  are  3 '^5'.  A  dsDNA  duplex  formed  between  a  strand  and  its  reverse 

TCGCA 

complement  is  called  a  Watson-Crick  (WC)  duplex,  e.g.,  .  Note  that  non-WC  duplexes 

TCGCA 

can  form  and  such  a  formation  is  called  a  cross-hybridization.  Cross-hybridizations  are 
undesirable  and  there  is  a  need  to  carefully  design  the  synthetic  DNA  to  ensure  that  a  cross¬ 
hybridization  never  happens.  The  length  of  ssDNA  or  a  dsDNA  WC  duplex  is  the  number  of 
bases  or  base  pairs  (bp)  respectively  in  the  strand.  For  example,  TCGCA  is  called  a  5-mer  (mer 

TCGCA 

is  short  for  polymer)  and  the  length  of  the  WC  duplex  ^  is  5bp.  See  Figure  1. 

TCGCA 
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Coding  Strands  Probing  Complement  Strands 

for  Ligation  for  Reading 


TACGCGACTTTC 

ATCAAACGATGC 

TGTGTGCTCGTC 

ATTTTTGCGTTA, 

CACTAAATACAA 

GAAAAAGAAGAA, 

5’  3’ 


GAAAGTCGCGTA 

GCATCGTTTGAT 

GACGAGCACACA 

TAACGCAAAAAT 

TTGTATTTAGTG 

TTCTTCTTTTTC 

5’  3’ 


^  5TACGCGACTTTC3’ 

Must  Have  .SVIOOOOIOVWO.9 


ATCAAACGATGC 

1V0111031V30 


Watson  Crick 
(WC)  Duplexes 


Must  Avoid 


TACGCGACTTTC 

1V0111031V30 


ATTTTTGCGTTA 

wowovwwo 


Cross  Hybridized 
(CH)  Duplexes 


Figure  1:  A  DNA  Code 


Hybridization  assays  offer  the  possibility  of  simultaneously  processing  trillions  of  bits  of 
information.  In  DNA  hybridization  assays  for  biomolecular  computing,  concatenated  DNA 
strands  can  be  used  for  multiple  purposes.  They  can  be  used  to  store,  write,  read  and  retrieve 
information.  Hybridization  assays  with  DNA  strands  are  also  used  to  separate,  manipulate, 
identify  and  address  molecules  in  many  other  important  experiments  beyond  biomolecular 
computing  [9]-[l  1].  Figure  2  shows  a  CAM  concept  using  DNA  for  the  probe  and  store. 


Hj 


Stimulus 


;-c 


Total  Memory 


H- 


Associated  Memory 


Figure  2:  Scheme  for  Parallel  Search  for  Multiple  Associations  in  DNA 
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In  DNA  biomolecular  computing,  occasions  can  arise  where  a  sample  containing  several 
distinct  sequences  of  DNA  needs  to  be  analyzed.  For  example,  each  individual  sequence  in  a 
mixture  of  DNA  could: 

(i)  encode  a  solution  to  a  mathematical  problem  [12], 

(ii)  be  stored  information  associated  to  an  entity  [13]. 

(iii)  be  a  taggant  or  label  associated  to  a  target  [14]. 

In  these  cases,  the  composition  of  each  DNA  strand  in  mixture  needs  to  be  determined  so 
that  each  mathematical  solution,  memory  and/or  target  can  be  respectively  retrieved.  This 
research  shows  how  a  single  and  parallel  battery  of  reactions  performed  on  a  mixed  DNA  sample 
containing  an  arbitrary  subset  of  several  double  stranded  DNA  sequences  taken  can  be  used  to 
determine  the  composition  of  each  sequence  in  the  mixture. 

Further,  this  research  demonstrates  that  the  combinatorial  method  employed  can  feasibly 
yield  billions  of  covert  and  synthetic  DNA  memory  strands  that  carry  object  and  process 
information.  A  key  component  of  the  innovation  is  the  combinatorial  method  of  bio-memory 
design  and  detection  that  encodes  product,  item  or  process  information  as  a  numerical  sequence 
represented  in  DNA.  This  DNA  data  structure  can  be  read  by  the  wet  laboratory  method 
polymerase  chain  reaction  (PCR)  (that  can  also  be  converted  into  an  electrical  signal)  and  then 
algorithmically  decoded  to  retrieve  virtually  an  unlimited  amount  of  item  or  process  information 
that  has  been  stored  in  the  combinatorial  memories.  In  Figure  3,  data  is  encoded  using  DNA 
substrands  with  the  whole  library  strand  containing  related  associations,  i.e.,  "a  memory." 
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relational  table 


I  Last  Name 

I  Eye  Color  I 

Friend  (or  Foe) 

Primary  Occupation  I 

CAj^^CGCCGAill^ 

TTAGCGTAAAAT 

GTCCGT^AGTGC 

GCATGGAAGGte 

CGCCTGTAACTA 

CGTTCAG^CCA 

TTGGAACGATAG^ 

TGTTTGCGACGG 

^  AGTTCCTGTGG> 
1  AGACGAGCACGA 


PCR  primer 


Newman 


Blue 


GCTCGACTAAGA 
^CTCGACTAAGA 
S]CCA 


PCR  primer 


Actor 


CAAACGCCGAA/|GCAAGCAGTGGW'6GTCGCTGT  5CCTGAGGCACCA 


memory  strand 


Figure  3:  The  DNA  Associative  Array  Relational  Table 
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3.0  METHODS,  ASSUMPTIONS,  PROCEDURES 


3.1  Numerical  Sequences  Represented  in  DNA 

Throughout  the  remainder  of  this  report  all  lower  case  variables  are  natural  numbers,  e.g., 
n,  q,  s,  and  t. 

A  fixed  set  of  n •  q  relatively  short  t-mers  of  ssDNA  is  called  a  t-DNA  nxq  table  code 
and  is  denoted  by  DNA_TC(n,q,t).  See  Figure  4  for  an  example  of  a  DNA_TC(5,2,10)  where  n 
is  positions  along  the  long  strand,  q  is  the  number  of  rows  and  t  is  the  length  of  the  substrand. 
The  sequences  in  a  given  DNA_TC(n,q,t)  are  called  table-mers.  A  ssDNA  memory  library  is  the 
collection  of  q"  relatively  long  n  •  t  -mers  strands  of  ssDNA  that  are  concatenated  from  a  fixed 
DNA_TC(n,q,t).  A  member  of  a  ssDNA  memory  library  is  called  a  ssDNA  memory. 

Key  Idea:  Any  finite  numeric  sequence  can  be  encoded  as  a  ssDNA  (or  dsDNA)  memory  and 
vice-versa. 

For  example,  using  the  table-mers  from  Figure  4,  the  binary  sequence  01101  is  encoded 
as  CGTCCATCGT  CeCAAeCTSA  AGTGGATGCG  TCGGTAAGCG  TCGGAGTGCT.  This  encoding  is  possible 
because  only  certain  collections  (partitioned  by  font  type)  of  sequences  are  allowed  to  be  in  each 
position  (e.g.,  Arial  =  position  0,  Comic  =  position  1,  etc.  )  and  within  each  collection,  distinct 

strands  are  assigned  distinct  numerical  values  (e.g.,  CGTCCATCGT  =  0,  GCAGAAGCCA  =  1 

for  position  0).  It  is  straightforward  to  see  that  table-mers  can  be  used  to  make  a  table  that  in 
turn  can  be  concatenated  to  make  q“  distinct  longer  DNA  memories  encoding  each  numeric 
sequence  with  n  digit  positions  where  each  digit  can  range  from  0  to  q- 1 . 


position  0 

position  1 

position  2 

position  3 

position  4 

0 

CGTCCATCGT 

CATTC&C&&A 

ACAGTTGCCG 

TCGGTAAGCG 

GAGCGAACCA 

1 

GCAGAAGCCA 

C&CAA&CT&A 

AGTGGATGCG 

TGCACGAGAC 

TCGGAGTGCT 

Figure  4:  A  DNA_TC(5,2,10) 
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Each  table-mer  in  DNA_TC(5,2,10)  in  Figure  4  can  be  labeled  by  an  ordered  pair 
(position,  value).  The  first  coordinate  corresponds  to  the  position  and  the  second  coordinate 
corresponds  to  the  value.  Font  type  only  indicates  position.  For  example  (0,1)= 

GCAGAAGCCA,  while  (2,0)=  ACAGTTGCCG. 


For  every  ssDNA  memory  there  is  a  corresponding  dsDNA  memory  that  is  the  unique 
WC  duplex  that  contains  the  ssDNA  memory.  See  Figure  5. 


CGTCCATCGT  C&CAA&CV&A  AGTGGATGCG  TCGGTAAGCG  TCGGAGTGCT 


CGTCCATCGT 


TCGGT.V\GCG  TCGGAGTGCT 


Figure  5:  A  '^longer  strand^'  double-stranded  WC  ComDMem 


Henceforth,  a  ssDNA  memory  is  identified  with  the  unique  WC  dsDNA  memory  that 
contains  it  and  the  term  DNA  memory  henceforth  means  WC  dsDNA  memory.  For  a  given 
DNA_TC(n,q,t)  table  code  M,  let  MEM  _LIB(n,q,n  t)  of  M  denote  the  collection  of  q"  possible 
double-stranded  n  •  t  bp  memories  that  can  be  formed  by  concatenation,  where  each  DNA 
memory  is  identified  by  its  top  5'^3'  strand.  For  example,  the  DNA  memory  in  Figure  3  is  a 
member  of  MEM  _  LIB(5, 2, 50)  of  Figure  4. 


3.2  Polymerase  Chain  Reaction  Laboratory  Method 

Polymerase  chain  reaction  (PCR)  is  a  technique  widely  used  in  molecular  biology, 
forensic  science,  environmental  science,  and  many  other  areas  [15-16].  Briefly,  PCR  is  a  test 
tube  system  that  exponentially  replicates  a  substrand  of  a  DNA  memory  that  is  delimited  by  two 
sequence  specific  recognition  sites  (e.g.,  table-mers)  which  are  found  at  the  ends  of  the  substrand 
to  be  selectively  amplified.  By  incubating  a  DNA  memory  mixture  with  oligonucleotide 
recognition  site  PCR  primers  and  the  enzyme  DNA  polymerase,  the  presence  of  a  pair  of 
recognition  sites  on  a  common  substrand  of  a  DNA  memory  can  be  determined  by  whether  or 
not  a  PCR  amplification  occurs. 
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Key  Idea:  This  PCR  amplification  information  can  be  mathematically  exploited  to  decode 
layered  memories. 

A  standard  method  for  detection  of  amplification  involves  an  electrical  separation  and 
detection  of  DNA  substrands  on  a  size  separation  media  called  a  gel.  There  are  other  more 
sensitive  and  faster  (e.g.,  real-time  PCR)  methods  that  automate  the  entire  PCR  protocol  and  can 
detect  amplification.  These  instruments  can  very  reliably  provide  the  information  needed  to 
conduct  the  mathematical  algorithms  in  a  cost  effective  manner. 

3.3  Memory  Design  and  Synthetic  DNA  Code  SynDCode  Software 

The  decoding  accuracy  of  DNA  memories  by  the  PCR  method  depends  upon  whether  or 
not  so-called  false  priming  sites  exist  in  the  memories.  The  priming  sites  for  this  method  are  the 
table-mers  used  to  construct  the  memories.  False  priming  site  sequences  can  arise  if  two  or  more 
of  the  table-mers  are  too  similar  or  if  the  memory  sequence  regions  that  overlap  the  junctions 
where  table-mers  are  concatenated  are  too  similar  to  the  original  table  sequences. 

The  synthetic  DNA  code  software,  SynDCode  [17]  is  a  tool  developed  to  design  synthetic 
DNA  sequences  to  be  used  in  biologically  based  information  systems  (e.g.,  DNA  computing, 
DNA  memory,  DNA  nanodevices  and  DNA  memories).  SynDCode  allows  for  the  specification 
of  thermodynamic  distance  and  dissimilarity  so  that  the  synthetic  table-mers  (and  their 
complements)  do  not  create  false  priming  sites.  The  table-mers  in  Figure  4  were  designed  by 
SynDCode  to  be  non-complementary  and  non-cross-hybridizing  so  that  each  position  in  a 
memory  library  strand  will  be  (ultra)  specific  for  a  unique  PCR  primer.  The  fact  that  SynDCode 
gives  non-cross-hybridizing  output  has  been  experimentally  verified  repeatedly  in  the  laboratory. 
Enhanced  SynDCode  strand  design  optimization  methods  were  developed  in  [25-28]. 
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3.4  The  PCR  Signal  and  PCR  Network  Graph 

As  a  small  example,  eonsider  the  table-mers  in  Figure  4  and  all  32  distinct  (one  for  each 
0,  1  string  of  length  5)  memories  in  MEM  _LIB(5,2,50)  formed  from  Figure  4.  For  the  general 

n-(n-l)-q^ 

memory  library  MEM_LIB(n,q,n-t),  there  are  2  primer  pairs  of  table-mers  and  thus  the 

n-(n-l)-q^ 

same  number  of  distinct  PCR  reactions  with  each  memory  being  positive  for  exactly  2  of 
them.  In  the  above  example,  n  =  5  and  q  =  2,  so  there  are  40  distinct  PCR  reactions  to  perform 
with  any  given  memory  being  positive  for  10  of  them.  Forty  may  seem  like  many  reactions,  but 
current  PCR  technology  allows  for  768  simultaneous  reactions  (e.g..  Applied  Biosystems  Auto- 
Lid  Dual  384- Well  GeneAmp®  PCR  System  9700). 

Figure  6(a)  below  is  a  graphical  interpretation,  called  a  PCR  network  graph,  of  all 
possible  PCR  reactions  from  primer  pairs  of  table-mers  from  Table  1.  The  lines  connecting  the 
nodes  in  the  graph  denote  all  possible  primer  pairs.  Notice  that  there  are  no  lines  between  primer 
pairs  with  the  same  first  coordinate  (e.g.,  (4,0)  and  (4,1)).  This  is  because  no  single  memory  can 
have  two  distinct  table-mers  at  the  same  position.  In  Figure  6(b),  the  set  of  bold  lines  denotes  the 
set  of  positive  PCR  reactions  for  the  DNA  memory  represented  by  01 101. 


(a) 


(b) 


Figure  6:  PCR  Network  Graph 
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Key  Idea:  By  using  smaller  DNA  fragments  that  mathematically  constitute  what  is  known  as  a 
combinatorial  cover  (hence  the  name  ComDMem)  the  same  PCR  network  graph  information  can 
be  obtained  that  would  be  received  from  a  longer  DNA  memory. 

Let  M  be  a  fixed  collection  of  table-mers  DNA_TC(n,q,t).  An  s-DNA  cover  o/M  is  a 
collection  of  double-stranded  WC  duplexes  concatenations  of  s  table-mers  taken  from  M  that 
yield  exactly  all  the  same  positive  PCR  reactions  that  exist  for  the  entire  memory  library 
MEM  _LIB(n,q,n  t)  for  M.  A  DNA  sequence  in  an  s-DNA  cover  is  called  a  covering  strand. 
Note,  since  the  length  of  such  a  covering  strand  is  s  •  tbp,  then  the  s-DNA  cover  of  M  is  called  a 
COV  DNA (n,  q,  s  •  t)  of  M.  A  COV  DNA (n,  q,  s  •  t)  of  M  is  also  referred  to  as  an  s-DNA  cover  of 
the  memory  library  MEM  _  LIB(n,  q,  n  •  t)  constructed  from  M. 

Key  Idea:  By  using  DNA  covers  of  DNA  memory  libraries,  a  virtual  memory  can  be  constructed, 
i.e.,  ComDMems,  that  behave  exactly  like  real  (and  longer)  memories  in  the  library  with  respect 
to  their  PCR  signal.  Thus,  for  MEM_LIB(n,  q,n-t) ,  instead  of  having  to  painstakingly  construct  cf 

memories,  one  can  construct  approximately  strands  in  COV_DNA{n,q,s -t)  and  get  the  same 

results  by  algorithmic  mixing  to  make  the  ComDMems.  This  amounts  to  a  feasible  fold  cost 
reduction.  For  example,  with  n  =  10,  q  =  2,  s  =  3,  the  reduction  is  approximately  100  fold. 
Moreover,  the  physical  construction  of  long  DNA  memory  sequences  when  n-t  is  greater  than 
200  is  virtually  impossible.  Thus,  to  get  massive  amounts  of  data  storage  capability, 
COV  DNA  [n,  q,s-t)  must  be  used. 

For  example,  consider  the  MEM  _LIB(5,2,50)  constructed  from  Table  1  and  let  C  be  a 

COV  DNA (5,2,30)  3-cover.  The  four  covering  strands  csi,  CS2,  CS3  and  CS4  in  C  that  appear 

below  in  Figure  7  together  constitute  a  virtual  ComDMem  memory  for  the  actual  memory  that 
appears  in  Figure  5  above. 
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CGTCCATCGT  CGCAAGCTGA  AGTGGATGCG 

CSt  = 

CGTCCATCGT  CGCAAGCTGA  AGTGGATGCG 

CGTCCATCGT  CGCAAGCTGA  TCGGAGTGCT 

CS2—  CGTCCATCGT  C6C:AA6C:T6A  TCGGAGTGCT 

AGTGGATGCG  TCGGTAAGCG  TCGGAGTGCT 

CS3  AGTGGATGCG  TCGGTAAGCG  TCGGAGTGCT 

CGTCCATCGT  CGCAAGCTGA  TCGGTAAGCG 

CS4  = 

CGTCCATCGT  CGCAAGCTGA  TCGGTA.\GCG 

Figure  7:  Covering  Strands  for  Memory  in  Figure  5 

The  virtual  aspect  of  the  collection  of  the  four  covering  strands  can  be  observed  in  Figure 
8.  Each  of  the  four  covering  strands  gives  rise  to  three  positive  PCR  reactions.  For  example,  CS3 
has  positive  PCR  reactions  for  the  primer  pairs  in  the  triangle  (3,0),  (4,1),  (2,1)  whose  lines  are 
shaded  with  the  shorter  dashes  (-  -  ).  The  triangle  of  edges  that  are  positive  for  each  covering 
strand  csi  are  shaded  according  to  the  line  type  associated  with  csi  in  Figure  7.  Note  that  the  line 
between  (0,0)  and  (1,1)  appears  in  three  of  the  four  triangles  and  is  thus  partially  highlighted  by 
three  different  shadings.  Comparing  Figure  8  to  Figure  6(b),  it  can  be  observed  that  csi,  CS2,  CS3 
and  CS4  in  total  give  the  same  ten  positive  PCR  reactions  as  does  the  single  longer  memory  that 
they  cover. 

Key  Idea:  From  the  point  of  view  of  the  positive  PCR  reactions,  the  single  longer  memory  is 
indistinguishable  from  the  mixture  of  the  covering  strands,  i.e.,  the  virtual  memory  ComDMem. 


Figure  8:  PCR  Results  from  Covering  Strands  in  Figure  7 
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(a)  (b) 

Figure  9:  PCR  Graphs  of  Solely  Positive  PCR  Reactions 


When  a  graphical  representation  of  PCR  reactions  is  given,  only  lines  that  denote  positive 
PCR  reactions  need  to  be  given.  Using  this  representation  Figure  6(b)  becomes  Figure  9(a). 

Using  the  binary  representation  of  MEM  _  LIB(5, 2, 50) ,  Figure  9(b)  gives  the  positive 
PCR  reactions  for  the  group  1 1000,  00110,  11100  and  1 1 1 10  of  four  layered  memories.  Note  that 
theoretically.  Figure  9(b)  would  be  the  same  if  either  the  four  actual  MEM  _  LIB(5, 2, 50) 
sequences  11000,  00110,  11100  and  11110  of  50bp,  or  the  sixteen  covering  strands  of  30bp  in 
COV_DNA(5,2,30)  that  covered  each  of  the  memories  11000,  00110,  11100  and  11110,  were 
combined. 

Key  Idea:  The  physical  manufacture  of  all  the  DNA  memories  in  a  MEM_LIB(n,q,n-t)  is  an 
extremely  costly,  low  yield  and  sometimes  impossible  endeavor  especially  for  large  n  and  t.  With 
this  combinatorial  innovation,  one  can  get  the  same  benefits  by  using  COV_DNA{n,q,s -t)  with  a 
feasible  fold  reduction  in  cost 
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3.5  Oligo  Visualization  with  3DViews 

A  task  was  added  to  the  project  to  help  bridge  the  gap  between  the  virtual  design  and 
expected  interaction  of  short  DNA  strands  and  the  physical  implementation  and  real  interactions 
in  a  physical  experiment.  A  physical  model  of  the  DNA  strand  and  strand  to  strand  interactions 
was  created.  A  graphical  user  interface  was  created  to  allow  designers  to  visualize  the  complex 
physical  structures  and  interactions  of  DNA  systems.  A  large  scale  tiled  computer  display 
system  was  built  to  provide  the  large  display  area  with  high  pixel  resolution  needed  to  display 
the  DNA  interactions.  The  completed  hardware  /  software  /  interaction  model  system 
was  called  SDViews  and  is  described  in  the  results  section. 
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4.0  RESULTS,  DISCUSSION 


4. 1  The  Mathematical  Model 

For  2  <  n,q  let  Vjj  q  be  the  set  of  all  ordered  pairs  (p,  Vp)  where  p  e  [n]  and  Vp  e  [q] . 

An  n-set  in  Vji  q ,  {(p,  Vp)}pg[jj] ,  where  the  first  coordinates  are  distinct,  can  be  uniquely 

identified  with  an  element  of  [q]“  and  vice-versa.  Under  this  bijection,  each  x  e  [q]"  , 

T  =  Vo...Vjj_j,  is  identified  with  the  n-set  of  ordered  pairs  in  Vjj^q,  x  =  {(0,Vo),...,(n-l,Vj,_i)} , 

where  the  first  coordinate  designates  the  position  in  the  sequence  and  the  second  coordinate 
represents  the  value  at  that  position.  For  example  {(2,0),  (0,1),  (1,3)}  corresponds  to  130. 
Henceforth,  [q]"  to  denotes  n-sets  in  Vjj  q  where  the  first  coordinates  are  distinct. 

Ejjq  denotes  the  set  of  all  pairs  {(pi,Vj),(p2,V2)}  in  Vjjq  where  pj  ^^2-  Then  Ejjq 

is  the  set  of  all  edges  in  the  q-partite  graph  0^  q  on  the  vertex  set  Vjj  q  where  the  independent 

sets  are  collections  of  vertices  with  the  same  first  coordinate.  Further  identify 
X  =  {(0,VQ),...,(n-l,Vjj_j)}  with  the  complete  subgraph,  denoted  K.^,  of  Gjj  q  on  the  vertices  in 

xe[qr. 

The  correspondence  between  the  mathematical  and  physical  entities  is  as  follows:  Vn,q  is 
identified  with  S,  MEM_LIB(n,q,n  t)  is  identified  with  [q]"  and  En,q  is  identified  with  all 
possible  PCR  reactions.  This  latter  identification  is  less  obvious  than  the  others.  A  pair  of 
primers  Vp  ,  Vp^  where  0<pj<p2<n-l  corresponds  to  a  unique  PCR  reaction.  Then 

identifying  { Vp  ,  Vp^ }  with  { Vp^  >  Vp  },  the  identification  of  En,q  and  PCR  reactions  is  observed. 
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Using  these  identifications,  given  a  pool  of  sequences,  U  =  ,  from  [q]*^ , 

consider  an  edge  e  =  {(pj,Vi),(p2,V2)}  in  Ejj^q.  Say  that  e /j' /or  {7  if  and  only  if  there 

is  a  T  e  U  such  that  x  has  value  vi  in  position  pi  and  value  V2  in  position  p2.  Considering  U  as  a 
pool  of  dsDNA  strands  taken  from  MEM  _  LIB(n,  q,  n  •  t)  and  considering  e  as  the  PCR  reaction 

for  primers  Vp  ,  Vp^ ,  then  e  is  positive  for  pool  U  if  and  only  if  an  exponential  amplification 

results  from  exposing  the  sample  U  to  the  PCR  reaction  Vp  ,  Vp^ .  Experimentally,  this 

exponential  amplification  can  be  observed  in  many  ways.  Some  of  these  ways  are  described  as 
conventional  gel  based  and  SYBR  green  and/or  Taqman  based  real-time  PCR  [12],  [16], 

Finally  given  a  pool  of  sequences  from  [q]“ ,  U  =  ,  let  Gy  denote  the 

subgraph  of  Gjj  qthat  consists  of  all  the  edges  positive  for  U.  Gy  is  the  graph-theoretic  union  of 
the  complete  subgraphs  .  If  U  is  considered  to  be  a  pool  of  dsDNA  strands  taken  from 

MEM  _  LIB(n,  q,  n  •  t) ,  then  Gy  is  identified  with  the  collection  of  all  positive  PCR  reactions 
taken  over  all  possible  pairs  of  primers  Ap^ .  In  either  the  mathematical  or  physical  setting, 

the  goal  is  to  identify  U  given  Gu .  The  interesting  applications  come  from  the  fact  that  Gy  can 

be  obtained  from  experimentation  without  the  direct  knowledge  of  the  contents  of  U. 

Consider  the  set  of  strands  S  given  in  Figure  4.  A  description  of  the  actual  physical 
construction  of  dsDNA  library  MEM  _LIB(5,2,50)  appears  in  [8].  Suppose  a  pool  U  taken  from 
MEM  _  LIB(5, 2, 50)  consists  of  the  duplexes  identified  by  1 1 000,  00110,  11100  and  11110. 

Then  Gy  is  given  in  Figure  10,  the  graph  Gy  depicting  all  the  positive  PCR  reactions  from 
Figure  4. 
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(2,0) 


Figure  10:  PCR  graph  Gu  with  U={11000,  00110,  11100,  11110}. 

A  closer  look  at  an  aspect  of  Figure  4  can  aid  the  discussion.  Consider  the  edge  {(1,1), 
(4,0)}.  From  Table  2,  this  edge  corresponds  to  the  PCR  reaction  primed  by  Sj  j 

=TCACACACACACACACAATT  and  the  complement  of  S4  q  =TCTCCTCTCCACTCAAAACC.  This 

PCR  reaction  yields  amplification  because  the  dsDNA  strands  1 1000  and  11100  are  members  of 
U  that  have  the  values  1  and  0  in  the  1  st  and  4th  positions  respectively.  (The  position  count  starts 
with  0.)  Note  that  the  PCR  reaction  primed  by  Sg  0  =CCAAACCTCCACTTTCCAAC  and  the 

complement  of  $2  0  =CCTTTCCTCCATCACCTCAT,  corresponding  to  edge  {(0,0),  (2,0)}  does  not 

yield  an  amplification  because  no  strand  in  U={1 1000,  00110,  11100,  11110}  has  the  value  0  in 
both  the  0th  and  2nd  positions. 
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4.2  The  Identification  Algorithms 

To  identify  U  from  Gy ,  two  approaches  are  taken.  The  methods  are  generalizations  of 

those  of  combinatorial  group  testing  [18-24],  The  first  is  called  the  disjunct  algorithm  that 
identifies  the  strands  in  MEM  _LIB(n,q,n  •  t)  that  are  surely  not  in  U.  The  second  is  called  edge 
representative  decoding  that  identifies  the  strands  in  MEM  _  LIB(n,  q,  n  •  t)  that  surely  are  in  U. 

Call  the  disjoint  sets  of  strands  identified  by  these  algorithms  the  resolved  positives  and 
resolved  negatives,  denoted  RP  and  RN  respectively.  From  the  definitions  of  these  sets,  then: 

RP(zU(zLnq(S)-RN.  (1) 

Hence,  if  RP  =  L„  q  (S)  -  RN ,  then  U  =  RP  =  L„  q  (S)  -  RN . 

The  disjunct  algorithm  is  simple  to  state: 

Disjunct  Algorithm:  Any  sequence  x  e  [q]" ,  thought  of  as  a  complete  subgraph  in  Gjj  q, 

that  has  an  edge  that  does  not  appear  in  Gu  is  a  member  of  RN. 

The  disjunct  algorithm  works  because  every  edge  of  every  x  e  U  corresponds  to  a 
positive  PCR  reaction. 

The  edge  representative  decoding  is  a  little  more  complicated. 

Edge  Representative  Decoding:  Any  sequence  x  e  [q]" ,  thought  of  as  a  complete  subgraph 
in  Gjj  q ,  that  is  also  a  complete  subgraph  in  Gy  and  that  has  an  edge  that  is  not  contained  in  any 
other  complete  subgraph  K^.  in  Gu  is  a  member  of  RP.  In  other  words,  x  e  RP  if  and  only  if 
is  a  complete  subgraph  of  Gu  that  has  an  edge  that  is  not  contained  in  any  other  complete 
subgraph  in  Gu  with  K^.  ^  K.^. 

Edge  representative  decoding  works  because  every  edge  in  Gu  is  contained  in  a 
complete  subgraph  for  x  e  U .  Thus  if  an  edge  in  Gu  is  contained  in  a  unique  complete 
subgraph  of  Gu ,  then  that  subgraph  must  be  for  some  x  e  U . 
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4.3  Algorithmic  Implementation 

In  this  section  the  graph  theoretic  algorithms  on  the  abstract  PCR  graph  that  implement  in 
the  PCR  data  decoding  software  are  given.  Consider  information  storage  as  a  set  of  data  values, 
S={sk}.  Each  data  value  Sk  is  a  ComDMem  and  each  ComDMem  is  a  set  of  ordered  pairs,  i.e., 

Sk  =  {(i  j’)}o<i<N 

o^j<q  where  each  ordered  pair  (i,j)  is  a  table-mer.  The  fundamental  question  now 
becomes:  how  to  effectively  implement  the  representative  decoding  so  that  S  can  be  found. 

The  algorithms  for  reconstructing  ^  are  graph-theoretic  in  nature,  so  one  must 

reformulate  the  problem.  Let  ^  be  our  PCR  graph,  i.e.,  vertices  are  table-mers, 

edges  are  positive  PCR  reactions  and  a  ComDMem  is  a  clique  of  size  N. 

Let  G'  =  (V‘,E‘)  be  the  graph  with  the  following  properties: 

1)  Each  vertex  veV  has  associated  with  it  a  set  of  pairs  {(i,j)}  with  0<i<q  and 
0  <  j  <  N  . 

2)  If  (i,j)  and  (k,l)  are  in  the  set  associated  with  a  vertex  veV'^  ,  then  i^^k  . 

3)  If  the  vertices  of  an  edge  (v,w)eE‘  have  associated  sets  of  pairs  p^  and  p^  ,  and 
(i,j),(k,l)epy  up^  ,then  i^^k  . 

ThisG^  =  (V'^,E^)  graph  type  is  an  extension  of  the  PCR  graph  G^  =  (V®,E^) .  Figure  1 1 
illustrates  the  extension.  Instead  of  each  node  having  just  one  pair  (i,  j)  ,  it  has  a  set  of  pairs 

.  To  make  viewing  easier,  each  set  of  pairs  is  represented  by  an  N  -tuple.  For  example, 
|(2, 0)|  is  represented  by  (*,  *,  O)  which  represent  an  unfolding  ComDMem.  Each  vertex 

V  e  is  equivalent  to  subset  of  the  pairs  from  a  data  entry  Sj.  e  S ,  in  other  words  a  partial  or 

fuzzy  ComDMem.  Each  edge  is  equivalent  to  positive  PCR  reactions.  One  can  construct  a 
sequence  of  graphs  satisfying  the  properties  above  using  Algorithm  1  given  in  Figure  12 
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G”  G' 

Figure  11:  G‘~^  to  G*  Graph  Extension  Scheme 


Let  p  [ii?]  he  the  sets  of  pairs  associated  with  node  w. 

//Each  edge  in  generates  a  node  in  C* 
for  all  ^  do 

Create  a  new  node  £  V*. 

end  for 

/ /SniaJJ  cliques  in  geneiates  edges  in 

for  all  G  do 

//N\  C  are  the  neighbors  of 

.¥i  ^  b 

lor  all  E  such  that  c  do 

N,^N,u{w[-^} 

end  for 

//jVs  C  1/*“^  are  the  neighbors  of  ir'“C 

tor  all  11.4“^  E  such  that  ^  do 

end  tor 

//jV  C  are  the  neighbors  of  both  and 
N  ^  jYi  n  A4 

u?'  node  in  K'  created  from  u^~^) 
tor  all  E  jY  C  do 

E  form  a  3-clique, 

u'  ^  node  in  created  from  ) 

n‘  *—  nede  in  V*  created  from 
^  E^U(vi\w^) 

for  all  ■x^~^  E  jV  -  C  do 

if  then 

E  V^~'^  form  a.  4-cIique. 
node  m  created  from 

^  FU(T^T1^') 
end  if 
end  for 
end  for 

_ end  for _ 

Figure  12:  G‘“^  to  G‘  Graph  Extension  Scheme  Algorithm  1 
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By  starting  the  sequence  with  the  original  PCR  graph  G®,  Algorithm  1,  this  is  a  means  of 
finding  all  ComDMem  cliques  in  G .  The  idea  is  that  each  edge  in  G^“^  generates  a  node  in 

G*  .  Two  nodes  have  an  edge  in  G*“^  if  its  constituent  edges  from  G*  form  a  clique  of  size  3 
or  4.  Figure  13  illustrates  the  application  of  Algorithm  1. 


Figure  13;  How  Algorithm  1  Leads  to  ComDMem  Decoding 
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Successive  application  of  Algorithm  1  would  eventually  lead  us  to  all  the  cliques  in  the 
original  graph  ,  but  at  great  computational  cost.  Instead,  by  applying  the  unique  edge 
representative  method,  one  can  take  advantage  of  the  fact  that  G®  was  constructed  as  the  graph- 
theoretic  union  of  cliques  of  size  N  e.g.,  ComDMems.  Examine  the  edge  between  (*,0,*)  and 

in  G®  from  Figure  13.  The  nodes  and  the  intersection  of  their  neighbors  form  exactly 
one  data  entry  (0, 0, 0)  .  Since  each  edge  comes  from  at  least  one  data  entry,  this  means  that  a 
ComDMem  has  been  found.  This  edge  searching  algorithm,  presented  in  Algorithm  2,  allows  us 
to  find  entries  in  S  early  in  the  sequence  of  graphs  G*^ ,  G\  •  •  •  .  Data  entries  found  this  way  in 
G®  must  be  in  the  original  data  set  S. 


Let  p  1  lel  be  the  sets  of  pairs  associated  with  node  w. 

Si  ^  0 

for  all  {  u* 

.V*)  G  E*  do 

P^p[. 

t/‘]  U  p  [i.’*] 

for  all  i 

P  G  V  '  such  that  («*.  iP) ,  (F.  w*)  G  E*  do 

PUp  [u’*] 

end  for 

if  \p\  = 

N  and  (/',  j) ,  (/.  m)  G  P  =>  i  1  then 

si^ 

Si  UP 

end  if 

end  for 

Figure  14:  Unique  Edge  Representative  Computational  Cost  Reduction  Algorithm  2 
4.4  The  Algorithms  Applied  to  Physical  Experimental  Data 

To  exhibit  the  above  algorithms,  actual  dsDNA  experiments  were  performed  on 
MEM  _  LIB(5, 2, 50)  for  Figure  4.  A  description  of  the  physical  construction  of  this 
MEM  _  LIB(5, 2, 50)  appears  in  [8] .  From  MEM  _  LIB(5, 2, 50)  ,the  four  sequences  that  were 
selected  and  taken  as  U  are  given  in  Figure  10.  To  actually  select  strands  from  this  library,  a 
cloning  method  was  used.  The  library  was  amplified  with  outside  primers,  the  amplified  product 
was  cut  with  BamHI  and  Hindlll,  the  expected  fragment  was  purified  and  then  ligated  into  the 
vector  pBluescript  [8].  Four  of  a  total  of  12  isolated  clones  were  selected  to  be  the  pooled 
sample  U.  Before  these  four  strands  (clones)  were  pooled,  the  individual  dsDNA  were 
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sequenced  to  determine  which  library  members  were  actually  selected.  To  exhibit  the 
experimental  design  and  analysis,  an  incidence  matrix  is  useful  and  is  given  in  Figure  15.  In  the 
actual  experiments,  essentially  no  PCR  errors  occurred  and  the  empirical  outcomes  seen  in 
Figure  15  were  in  100%  agreement  with  the  theoretical  outcomes  that  can  be  founded  in  the  last 
two  rows  of  Table  15.  This  is  a  testament  to  the  SynDCode  design  method.  A  portion  of  gel 
output  of  the  actual  PCR  experiments  preformed  on  this  U  is  given  in  Figure  16. 
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Figure  15:  Record  of  Actual  PCR  Results 
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The  sequences  in  MEM  _LIB(5,2,50)  are  given  vertically  as  labels  for  the  columns  of  the 
incidence  matrix  in  Table  3  and  are  numbered  1-32.  The  sequences  in  U  are  distinguished  by 
bold  faced  fonts  and  are  in  columns  7,  25,  29,  31.  The  PCR  reactions,  i.e.,  the  edges  in  G5  2, 

correspond  to  the  rows  and  are  numbered  1-40.  The  edge  labels  are  given  in  either  the  positive  or 
negative  PCR  columns  depending  upon  whether  the  given  edge  is  positive  or  negative  for  U. 
Every  entry  in  the  matrix  corresponds  to  a  pair  (PCR  reaction,  sequence).  There  is  a  1  in  a  given 
entry  (i,j)  if  and  only  if  the  sequence  j  is  (theoretically)  positive  for  PCR  reaction  i.  Using  our 
mathematical  representation,  each  entry  corresponds  to  a  pair  (edge,  complete  subgraph)  and 
there  is  a  1  in  that  entry  if  the  given  edge  is  contained  in  given  complete  subgraph.  The  disjunct 
algorithm  uses  only  the  negative  PCR  reactions  which  are  listed  in  the  last  column  and  the  edge 
representative  decoding  algorithm  uses  only  the  positive  PCR  reactions  which  are  given  in  the 
penultimate  column.  In  the  actual  experiment,  whose  raw  results  can  be  seen  in  Figure  5,  the 
pooled  dsDNA  sample  is  separately  exposed  to  all  forty  pairs  of  PCR  primers. 

Using  Table  15  and  focusing  on  the  disjunct  algorithm,  sequences  9-16  are  in  RN  by 
virtue  of  PCR  reaction  2  because  each  of  the  sequences  9-16  contain  PCR  reaction  2  as  an  edge 
and  PCR  reaction  2  was  negative  for  the  given  U.  Thus  columns  9-16  are  labeled  m(2)  which  is 
meant  to  denote  that  these  sequences  are  in  RN  by  virtue  of  PCR  reaction  2  being  negative. 
Other  PCR  reactions  may  also  indicate  that  these  sequences  are  in  RN,  but  PCR  reaction  2  is  the 
first  in  our  ordering  to  do  so.  Similarly,  sequences  17-24  are  labeled  m(3),  sequences  1-4  are 
labeled  m(5),  sequences  5-6  are  labeled  rn(9),  sequence  8  is  labeled  rn(14),  sequence  26,  28,  30, 
32  are  labeled  m(16)  and  sequence  22  is  labeled  m(30).  Thus  RN={l-4,  5-6,  8,  9-16,  17-24,  22, 
26,  28,  30,  32}. 
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Using  Figure  15  and  focusing  on  the  edge  representative  decoding,  sequence  7  is  identified 
as  being  in  RP,  because  the  complete  graph  ,  x  =  001 10  is  the  column  7  label,  is  the  only 

complete  subgraph  of  Gy  that  contains  the  edge  {(0,0),  (1,0)}  which  denotes  the  positive  PCR 

reaction  1  .  Thus  column  7  is  labeled  RP(1)  which  is  meant  to  denote  that  this  sequence  is  in  RP  by 
virtue  of  PCR  1  being  positive.  Other  PCR  reactions  may  also  indicate  that  this  sequence  is  in  RP, 
but  PCR  reaction  1  is  the  first  in  our  ordering  to  do  so.  Similarly  sequences  25,  31  and  29  are 
respectively  identified  by  the  positive  PCR  reactions  7,  12,  and  31  and  columns  25,  31  and  29  are 
respectively  labeled  RP(7),  RP(12),  RP(31).  Since  RP  =  Lj  2(S)-RN ,  then 

U  =  RP  =  L5  2(S)-RN. 


Figure  16:  A  Portion  of  the  Electrophoresis  Gel  from  the  PCR 


Figure  16  gives  a  portion  of  the  electrophoresis  gel  from  the  PCR  reactions  whose 
positive  and  negative  results  are  recorded  in  Figure  15.  The  lanes  where  bands  can  be  seen  are 
positive  PCR  reactions  for  the  encoded  primers  given  at  the  bottom  of  the  lane.  The  row  number 
of  Figure  15  that  corresponds  to  a  lane  appears  in  Figure  16  directly  below  the  encoded  primers 
for  the  given  lane.  In  all,  there  were  forty  separate  PCR  reactions  being  primed  by  all  forty 
primer  pairs,  each  with  the  same  dsDNA  sample  U.  Each  reaction  occurred  in  a  separate  well 
with  each  well  corresponding  to  distinct  a  lane  in  the  gel. 
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4.5  The  General  Setting,  Parameters  and  Simulated  Performance 

In  general,  the  size  of  MEM  _  LIB(n,  q,  n  •  t)  is  q"  and  the  number  of  PCR  reactions  for 

this  library  is  .  In  Figure  17,  the  outcome  of  simulated  performance  is  given  and 

compared. 


Memory  Size,  #  pairwise  associations  (q'^  -  6^) 

-10’’ 

Simultaneous  Records  Accessed  (picked) 

20 

Simultaneous  Associations  Accessed  @15 

300 

Accuracy 

97% 

SynDCode  Strands  Required  (nq) 

54 

Length  of  DNA  Memory  Strands 

180 

Number  of  DNA  Library  Strands 

-10’’ 

Number  of  PCR  Reaction  Wells 

324 

Number  of  PCR  Reactions 

2916 

Computer  Clock  Cycles  Required 

0(108) 

DNA  data  stmeture  require  r  Standard  computer  requires 

0(n^qh  PCR  reactions,  O;  e  '■)  c  ;  0(n^q")  clock  cycles 

Figure  17:  Simulations  of  Algorithmic  Performance 
4.6  Visualization  with  SDViews 

The  system  3D  Views  was  created  to  provide  visualization  if  oligo  interactions.  A  model 
was  created  to  represent  the  physicality  of  oligos,  their  structure,  movement  and  interactions.  A 
user  interface  was  created  to  graphically  display  the  model  results.  For  DNA  libraries  that  are 
large  enough  to  be  of  interest  the  model  graphics  output  produces  large  and  detailed  images. 
Current  computer  screens  don’t  have  sufficient  pixel  densities  to  display  the  number  of  details 
desired  from  a  simulation.  A  tiled  display  system  was  built  to  increase  the  size  of  the  total 
display  without  giving  up  fine  detail. 

The  oligo  shape  was  modeled  as  an  elongated  ellipsoid  with  short  axis  a  and  long  axis  b. 
For  gross  movement  this  approximation  is  justified  by  the  rigidity  of  short  oligos  and  the  shape 
of  the  polar  charge.  Oligo  movement  was  modeled  by  a  Brownian  motion  3  dimensional  random 
walk.  The  one  dimensional  diffusion  coefficient  D  for  the  ellipsoid  shape  with  3  independent 
directions  is: 

D  =  (a)  ^2) 
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Where  T  is  temperature,  ke  is  Boltzmann’s  constant,  and  r\  is  the  viscosity  of  the  medium.  The 


random  walk  motion  is  modeled  by  assuming  the  oligo  is  on  a  three  dimensional  lattice  and  may 
move  a  step  distance  dl  in  a  step  time  dt.  In  m  time  steps,  the  oligo  will  move  n  grid  points  with 


equal  probability.  In  random  walk,  the  Brownian  motion  is  approximated  by: 

D  =  (3) 

From  these  two  equations,  motion  of  a  group  of  oligos  was  mapped  through  space  by  the  motion 
model. 


Reactions  between  two  or  more  oligos  that  land  on  the  same  grid  point  were  modeled  by 
assuming  a  diffuse  solution  with  a  Boltzmann  distribution  in  the  probability  of  oligos  landing  on 
a  grid  point.  The  reaction  between  multiple  oligos  landing  on  the  same  point  was  modeled  by 
the  Boltzmann  distribution  for  interaction  states  where  the  probability  Pj  of  state]  is: 

-AG,- 

exp  (-^ 

Pj  =  — ^  ^  ,  where  (4) 

—AC  ■ 

Z  =  S;exp(-|^)  (5) 

T  is  the  temperature,  ke  is  Boltzmann’s  constant,  and  /\Gj  is  the  difference  between  the  free 
energy  of  the  state  j.  The  oligo  model  and  design  tool  SynDCode  was  used  to  approximate  AGj 
[17]. 

Rendering  the  model  output  was  a  significant  issue  due  to  the  large  number  objects  to  be 
displayed  on  the  computer  screen.  High  resolution  was  needed  to  view  individual  hybridization 
reactions.  To  view  the  various  kinetics  permutations,  a  large  number  of  grid  points  were  needed. 
A  tiled  display  system,  Mobile  Stream  Processing  Cluster,  MOSAIC  was  built  to  aid  in 
visualization  of  the  system.  The  finished  modeling  cluster  and  display  system  is  shown  in  Figure 
18.  A  set  of  nine  1920  x  1200  pixel  monitors  were  tiled  3  x  3  on  a  stand  which  also  holds  the 
computer  cluster  and  power  supplies.  The  result  was  a  continuous  5760  x  3600  pixel  display. 
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Figure  18.  MOSAIC  Cluster 


To  run  the  visualization  model  and  drive  the  display,  three  8  core  Apple  Mac  Pros  were 
used  with  32GB  of  RAM  each.  Red  Hat  Enterprise  Linux  v.5x  was  used  for  the  operating 
system.  Each  Mac  Pro  was  given  three  ATI  Radeon  graphics  cards,  one  for  each  monitor  in  the 
tile  display.  The  computers  were  connected  with  10Gb  Ethernet. 

The  oligo  interaction  model  run  on  the  cluster  creates  a  continuous  series  of  OpenGL 
calls  that  represents  the  graphical  output  of  the  model.  The  distributed  graphics  processing 
application  Chromium  was  used  to  render  the  graphical  output  across  the  nine  displays  in  real 
time.  The  result  was  a  high  fidelity  physical  model  of  the  diffusion  and  interaction 
thermodynamics  of  a  large  set  of  oligos  and  a  9x  improvement  in  resolution  in  display  of  the 
model  output. 

The  MOSAIC  cluster  has  been  transitioned  to  three  projects  to  date.  It  is  home  to  the 
Distributed  Quantum  Computing  simulation  work  where  multi-thread  and  parallel  processing  are 
blended  to  reduce  latency  and  maximize  information  exchange  between  the  systems.  It  is  also  the 
main  demonstration  platform  for  the  SWATHBUCKLER  project  which  requires  the  use  of  the 
mosaic’s  nine  high-defmition  displays  to  view  wide  area  Synthetic  Aperture  Radar  data. 
Finally,  it  supports  an  Air  Force  Research  Laboratory  neuromorphic  computing  camera  project 
which  will  eventually  use  the  nine  tile  display  to  view  different  algorithmic  approaches  of 
computing  in  a  neuromorphic  design. 
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5.0  CONCLUSIONS 


This  project  developed  a  synthetic  DNA-based  associative  memory  called  ComDMems 
that  unlike  conventional  silicon  based  associate  memories  provides  for  a  high  degree  of  input 
parallelization  that  allows  for  a  significant  reduction  in  required  data  structure  queries. 

This  innovation  combines  mathematics  and  molecular  biology.  First,  it  uses  mathematics 
to  design  the  synthetic  DNA  that  makes  the  storage  of  information  in  ComDMem  possible.  Then 
it  uses  the  specificity  of  DNA  strand  recognition  and  the  wet  laboratory  method  of  polymerase 
chain  reaction  (PCR)  to  store  information  and  to  generate  a  signal.  Finally,  it  uses  mathematics 
to  decode  the  PCR  signal  and  identify  the  ComDMem  signatures  and  reveals  the  information  and 
associations  they  contain. 

By  using  mathematical  combinations  of  short  "covering  strands"  in  place  of  each  single 
and  longer  memory  strand,  covert  ComDMems  can  encode  a  vast  amount  of  information  in  a 
more  efficient  way  and  that  this  encoded  information  can  be  retrieved  only  by  an  authorized  user. 
A  uniform  method  of  covering  strand  construction  that  minimizes  the  number  of  covering 
strands  and  theoretically  and  experimentally  mimics  the  behavior  of  the  longer  memories  strands 
was  given.  This  project  demonstrated  a  method  of  decoding  the  PCR  output  that  minimizes  the 
number  of  PCR  reactions  for  given  number  or  distribution  of  superimposed  or  associated 
ComDMems. 

These  synthetic  ComDMems  are  feasibly  functional  at  concentrations  that  are  below  the 
parts  per  billion  level.  Thus,  they  could  not  be  reverse  engineered  because  their  detection  would 
only  be  possible  with  prior  knowledge  of  the  memory  specific  DNA  sequences  required  for  PCR 
amplification.  Hence,  ComDMem  synthetic  DNA  memories  are  highly  covert.  ComDMems  can 
encode  item  or  process  information  as  a  numerical  sequence  in  DNA,  are  highly  covert,  are 
capable  of  carrying  virtually  an  unlimited  number  of  data  fields,  and  are  deeply  super  imposable 
and  thus  associative. 
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In  general,  the  decoding  of  associative  DNA  memory  has  been  an  intractable  problem  for 
processes  requiring  deeply  superimposed  memories.  However,  ComDMems  are  constructed  in  a 
sophisticated  combinatorial  manner  so  that  the  decoding  of  such  deeply  associative  memories  is 
feasible.  Thus,  beyond  being  covert  and  information-rich,  our  DNA  memories  can  enable  design 
of  efficient,  scalable  and  technically  useful  libraries  of  synthetic  DNA  for  use  in  high 
performance  associative  memory. 


29 


6.0  REFERENCES 


1.  Adleman,  L.  M.,  “Molecular  Computation  of  Solutions  to  Combinatorial  Problems,”  Science, 
266,  1994,  pp.  1021-1024. 

2.  Head,  T.  and  Gal,  S.  "Aqueous  Computing:  Writing  Into  Fluid  Memory,"  Bulletin  of  the 
European  Association  for  Theoretical  Computer  Science,  75,  2001,  pp.  190-198. 

3.  Frutos,  A.  G.  et  al.  “Demonstration  of  a  Word  Design  Strategy  for  DNA  Computing  on 
SwxfdiCQS,''  Nucleic  Acids  Research,  25,  1997,  pp.  4748  -4756. 

4.  Murphy,  D.,  “Gene  Expression  Studies  Using  Microarrays:  Principles,  Problems,  and 
Prospects,”  Advances  in  Physiology  Education,  26,  2002,  pp.  256-270. 

5.  Winfree,  E.,  et  al.  “Design  and  Self-Assembly  of  Two-Dimensional  DNA  Crystals,”  Nature, 
394,  1998,  pp.  539-544. 

6.  Braun,  E.,  et  al.  “DNA-Templated  Assembly  and  Electrode  Attachment  of  a  Conducting  Silver 
Wire,”  Nature,  391,  1998,  pp.  775-778. 

7.  Whitesides  G.  M.  and  Boncheva,  M.,  “Beyond  Molecules:  Self-Assembly  of  Mesoscopic  and 
Macroscopic  Components,”  Proc.  Natl.  Acad.  Sci.,  99,  2002,  pp.  4769^774. 

8.  Gal,  S.,  Monteith,  N.,  Macula,  A.  J.,  “Successful  Preparation  and  Analysis  of  a  5-site  2- 
Variable  DNA  Library”,  Natural  Computing,  8 , 2009,  333  -  347. 

9.  Brenner,  S.,  “Methods  for  Sorting  Polynucleotides  Using  Oligonucleotide  Tags”,  U.S.  Patent 
No.  5,604,097,  1997 

10.  Brenner,  S.  et  al.,  “Gene  Expression  Analysis  by  Massively  Parallel  Signature  Sequencing 
(MPSS)  on  Microbead  Arrarys”,  Nat.  Biotechnol.,  18,  2000,  pp.  630-634. 

11.  Cai,  H.,  P.  White,  D.  Tomey,  A.  Deshpande,  Z.  Wang,  B.  Marrone,  and  J.  Nolan,  “Flow 
Cytometry-Based  Minisequencing:  A  New  Platform  for  High  Throughput  Single  Nucleotide 
Polymorphism  Scoring”,  Genomics,  66,  2000,  pp.  135-143. 

12.  Ibrahim,  Z,  et  al.  “A  New  Readout  Approach  in  DNA  Computing  Based  on  Real-Time  PCR 
with  Taqman  Probes”,  (C.  Mao  and  T.  Yokomori  Eds.),  DNA  12:  Lecture  Notes  in  Computer 
Science  4287,  2006,  pp.  350-359. 


30 


13.  Yamamoto,  M.,  et  al.,  “Large-Scale  DNA  Memory  Based  on  the  Nested  PCR”,  Natural 
Computing,  7,  2008,  pp.  335-346. 

14.  Hall,  B.,  et  al.,  “Survival  and  Polymerase  Chain  Reaction-Based  Detection  of  Nucleic  Acid 
Taggant  Markers  During  Bacterial  Growth  and  Sterilization”,  Analytica  Chimica  Acta,  475, 

2003,  pp.  67-73. 

15.  Mullis.  K.,  et  al.,  “The  Polymerase  Chain  Reaction”,  Birkhauser,  1994,  Boston 

16.  Valasek,  M.  A.,  Repa,  J.  J.,  “The  Power  of  Real-Time  PCR”.  Advan.  Physiol.  Edu.  29,  2005, 
pp.  151-159. 

17.  M.  A.  Bishop,  A.  J.  Macula,  T.  E.  Renz,  SynDCode  Suite,  2006, 
http://syndcode.geneseo.edu. 

18.  Du,  D.  Z.  and  Hwang,  F.  K.,  “Combinatorial  Group  Testing  and  Its  Applications”,  2nd  ed. 
World  Scientific,  2000.  Singapore. 

19.  Macula,  A.  J.,  “A  Simple  Construction  of  d-Disjunct  Matrices  with  Certain  Constant 
Weights”,  Discrete  Mathematics,  162,  1996,  pp.  311-312. 

20.  Macula,  A.  J.,  “Probabilistic  Nonadaptive  Group  Testing  in  the  Presence  of  Errors  and  DNA 
Library  Screening”,  Annals  of  Combinatorics,  3,  1999,  pp.  61-69. 

21.  Macula,  A.  J.,  “Probabilistic  Nonadaptive  and  Two-Stage  Group  Testing  with  Relatively 
Small  Pools  and  DNA  Library  Screening,  Journal  of  Combinatorial  Optimization,  2,  1999,  pp. 
385-397. 

22.  A.  Macula,  et  al.,  Nonadaptive  and  Trivial  Two-Stage  Group  Testing  with  d®-Disjunct 
Matrices,  Entropy  Search,  and  Complexity”,  Bolyai  Studies,  16,  2007,  pp.  71-84,  Springer 

24.  A.  Macula  and  L.  Popyack.,  “A  Group  Testing  Method  for  Finding  Patterns  in  Data”, 
Discrete  Appl.  Math.  144,  2004,  149-157. 

25.  A.  Macula,  et  al.,  “PCR  Nonadaptive  Group  Testing  of  DNA  Libraries  for  Biomolecular 
Computing  and  Taggant  Applications”,  Discrete  Mathematics,  Algorithms  and  Applications, 
Volume:  1,  Issue  1,  March  2009,  pp.59  -  69 


31 


26.  A.  Macula,  et  al.,  “Random  Coding  Bounds  for  DNA  Codes  Based  on  Fibonacci  Ensembles 
of  DNA  Sequences”,  2008  IEEE  Proceedings  of  International  Symposium  on  Information 
Theory,  pp.  2292  -  2296 

27.  A.  Macula,  et  al.,  “New,  Improved,  and  Practical  k-Stem  Sequence  Similarity  Measures  for 
Probe  Design”,  Journal  of  Computational  Biology,  5,  June  2008,  pp.  525-34. 

28.  A.  Macula,  et  al.,  “Random  Coding  Bounds  for  DNA  Codes  Based  on  Fibonacci  Ensembles 
of  DNA  Sequences”,  2008  IEEE  Proceedings  of  International  Symposium  on  Information 
Theory,  pp.  2292  -  2296 


32 


7.0  ACRONYMS 


CAM 

Content  Addressable  Memory 

DNA 

Deoxyribonueleie  Aeid 

dsDNA 

double  stranded  DNA 

MOSAIC 

Mobile  Stream  Processing  Cluster 

PCR 

Polymerase  Chain  Reaction 

RAM 

Random  Access  Memory 

ssDNA 

single  stranded  DNA 

WC 

Watson  -  Crick 

A 

Adenine 

C 

Cytosine 

G 

Guanine 

T 

Thymine 
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